智能论文笔记

Rethinking Skip Connections in Encoder-decoder Networks for Monocular Depth Estimation

Zhitong Lai , Haichao Sun , Rui Tian , Nannan Ding , Zhiguo Wu , Yanjie Wang

分类：计算机视觉 | 人工智能

2022-08-29

跳过连接是编码器网络中的基本单元，能够改善神经网络的特征宣传。但是，大多数带有跳过连接的方法仅连接了编码器和解码器中相同分辨率的连接功能，这忽略了编码器中的信息损失，而图层的进度更深。为了利用编码器较浅层中特征的信息损失，我们提出了一个完整的跳过连接网络（FSCN），以实现单眼深度估计任务。此外，要更接近跳过连接中的功能，我们提出了一个自适应串联模块（ACM）。此外，我们对FSCN和FSCN的室内和室内数据集（即Kitti Dataste和NYU DEPTH DATASET）进行了广泛的实验。

translated by 谷歌翻译

HTML版本

Online Statistical Inference for Contextual Bandits via Stochastic Gradient Descent

Xi Chen , Zehua Lai , He Li , Yichen Zhang

分类： (统计)机器学习 | 机器学习

2022-12-30

With the fast development of big data, it has been easier than before to learn the optimal decision rule by updating the decision rule recursively and making online decisions. We study the online statistical inference of model parameters in a contextual bandit framework of sequential decision-making. We propose a general framework for online and adaptive data collection environment that can update decision rules via weighted stochastic gradient descent. We allow different weighting schemes of the stochastic gradient and establish the asymptotic normality of the parameter estimator. Our proposed estimator significantly improves the asymptotic efficiency over the previous averaged SGD approach via inverse probability weights. We also conduct an optimality analysis on the weights in a linear regression setting. We provide a Bahadur representation of the proposed estimator and show that the remainder term in the Bahadur representation entails a slower convergence rate compared to classical SGD due to the adaptive data collection.

translated by 谷歌翻译

Discovering Customer-Service Dialog System with Semi-Supervised Learning and Coarse-to-Fine Intent Detection

Zhitong Yang , Xing Ma , Anqi Liu , Zheyu Zhang

分类：自然语言处理

2022-12-23

Task-oriented dialog(TOD) aims to assist users in achieving specific goals through multi-turn conversation. Recently, good results have been obtained based on large pre-trained models. However, the labeled-data scarcity hinders the efficient development of TOD systems at scale. In this work, we constructed a weakly supervised dataset based on a teacher/student paradigm that leverages a large collection of unlabelled dialogues. Furthermore, we built a modular dialogue system and integrated coarse-to-fine grained classification for user intent detection. Experiments show that our method can reach the dialog goal with a higher success rate and generate more coherent responses.

translated by 谷歌翻译

Fast Converging Anytime Model Counting

Yong Lai , Kuldeep S. Meel , Roland H. C. Yap

分类：人工智能

2022-12-19

Model counting is a fundamental problem which has been influential in many applications, from artificial intelligence to formal verification. Due to the intrinsic hardness of model counting, approximate techniques have been developed to solve real-world instances of model counting. This paper designs a new anytime approach called PartialKC for approximate model counting. The idea is a form of partial knowledge compilation to provide an unbiased estimate of the model count which can converge to the exact count. Our empirical analysis demonstrates that PartialKC achieves significant scalability and accuracy over prior state-of-the-art approximate counters, including satss and STS. Interestingly, the empirical results show that PartialKC reaches convergence for many instances and therefore provides exact model counting performance comparable to state-of-the-art exact counters.

translated by 谷歌翻译

Multi-embodiment Legged Robot Control as a Sequence Modeling Problem

Chen Yu , Weinan Zhang , Hang Lai , Zheng Tian , Laurent Kneip , Jun Wang

分类：机器人

2022-12-18

Robots are traditionally bounded by a fixed embodiment during their operational lifetime, which limits their ability to adapt to their surroundings. Co-optimizing control and morphology of a robot, however, is often inefficient due to the complex interplay between the controller and morphology. In this paper, we propose a learning-based control method that can inherently take morphology into consideration such that once the control policy is trained in the simulator, it can be easily deployed to robots with different embodiments in the real world. In particular, we present the Embodiment-aware Transformer (EAT), an architecture that casts this control problem as conditional sequence modeling. EAT outputs the optimal actions by leveraging a causally masked Transformer. By conditioning an autoregressive model on the desired robot embodiment, past states, and actions, our EAT model can generate future actions that best fit the current robot embodiment. Experimental results show that EAT can outperform all other alternatives in embodiment-varying tasks, and succeed in an example of real-world evolution tasks: stepping down a stair through updating the morphology alone. We hope that EAT will inspire a new push toward real-world evolution across many domains, where algorithms like EAT can blaze a trail by bridging the field of evolutionary robotics and big data sequence modeling.

translated by 谷歌翻译

Werewolf Among Us: A Multimodal Dataset for Modeling Persuasion Behaviors in Social Deduction Games

Bolin Lai , Hongxin Zhang , Miao Liu , Aryan Pariani , Fiona Ryan , Wenqi Jia , Shirley Anugrah Hayati , James M. Rehg , Diyi Yang

分类：机器学习 | 自然语言处理 | 计算机视觉

2022-12-16

Persuasion modeling is a key building block for conversational agents. Existing works in this direction are limited to analyzing textual dialogue corpus. We argue that visual signals also play an important role in understanding human persuasive behaviors. In this paper, we introduce the first multimodal dataset for modeling persuasion behaviors. Our dataset includes 199 dialogue transcriptions and videos captured in a multi-player social deduction game setting, 26,647 utterance level annotations of persuasion strategy, and game level annotations of deduction game outcomes. We provide extensive experiments to show how dialogue context and visual signals benefit persuasion strategy prediction. We also explore the generalization ability of language models for persuasion modeling and the role of persuasion strategies in predicting social deduction game outcomes. Our dataset, code, and models can be found at https://persuasion-deductiongame.socialai-data.org.

translated by 谷歌翻译

Sim-to-Real Transfer for Quadrupedal Locomotion via Terrain Transformer

Hang Lai , Weinan Zhang , Xialin He , Chen Yu , Zheng Tian , Yong Yu , Jun Wang

分类：机器人 | 机器学习

2022-12-15

Deep reinforcement learning has recently emerged as an appealing alternative for legged locomotion over multiple terrains by training a policy in physical simulation and then transferring it to the real world (i.e., sim-to-real transfer). Despite considerable progress, the capacity and scalability of traditional neural networks are still limited, which may hinder their applications in more complex environments. In contrast, the Transformer architecture has shown its superiority in a wide range of large-scale sequence modeling tasks, including natural language processing and decision-making problems. In this paper, we propose Terrain Transformer (TERT), a high-capacity Transformer model for quadrupedal locomotion control on various terrains. Furthermore, to better leverage Transformer in sim-to-real scenarios, we present a novel two-stage training framework consisting of an offline pretraining stage and an online correction stage, which can naturally integrate Transformer with privileged training. Extensive experiments in simulation demonstrate that TERT outperforms state-of-the-art baselines on different terrains in terms of return, energy consumption and control smoothness. In further real-world validation, TERT successfully traverses nine challenging terrains, including sand pit and stair down, which can not be accomplished by strong baselines.

translated by 谷歌翻译

Modeling Multimodal Aleatoric Uncertainty in Segmentation with Mixture of Stochastic Expert

Zhitong Gao , Yucong Chen , Chuyu Zhang , Xuming He

分类：计算机视觉

2022-12-14

Equipping predicted segmentation with calibrated uncertainty is essential for safety-critical applications. In this work, we focus on capturing the data-inherent uncertainty (aka aleatoric uncertainty) in segmentation, typically when ambiguities exist in input images. Due to the high-dimensional output space and potential multiple modes in segmenting ambiguous images, it remains challenging to predict well-calibrated uncertainty for segmentation. To tackle this problem, we propose a novel mixture of stochastic experts (MoSE) model, where each expert network estimates a distinct mode of the aleatoric uncertainty and a gating network predicts the probabilities of an input image being segmented in those modes. This yields an efficient two-level uncertainty representation. To learn the model, we develop a Wasserstein-like loss that directly minimizes the distribution distance between the MoSE and ground truth annotations. The loss can easily integrate traditional segmentation quality measures and be efficiently optimized via constraint relaxation. We validate our method on the LIDC-IDRI dataset and a modified multimodal Cityscapes dataset. Results demonstrate that our method achieves the state-of-the-art or competitive performance on all metrics.

translated by 谷歌翻译

Generating extreme quantum scattering in graphene with machine learning

Chen-Di Han , Ying-Cheng Lai

分类：机器学习

2022-12-13

Graphene quantum dots provide a platform for manipulating electron behaviors in two-dimensional (2D) Dirac materials. Most previous works were of the "forward" type in that the objective was to solve various confinement, transport and scattering problems with given structures that can be generated by, e.g., applying an external electrical field. There are applications such as cloaking or superscattering where the challenging problem of inverse design needs to be solved: finding a quantum-dot structure according to certain desired functional characteristics. A brute-force search of the system configuration based directly on the solutions of the Dirac equation is computational infeasible. We articulate a machine-learning approach to addressing the inverse-design problem where artificial neural networks subject to physical constraints are exploited to replace the rigorous Dirac equation solver. In particular, we focus on the problem of designing a quantum dot structure to generate both cloaking and superscattering in terms of the scattering efficiency as a function of the energy. We construct a physical loss function that enables accurate prediction of the scattering characteristics. We demonstrate that, in the regime of Klein tunneling, the scattering efficiency can be designed to vary over two orders of magnitudes, allowing any scattering curve to be generated from a proper combination of the gate potentials. Our physics-based machine-learning approach can be a powerful design tool for 2D Dirac material-based electronics.

translated by 谷歌翻译

Multi-scale Feature Imitation for Unsupervised Anomaly Localization

Chao Hu , Shengxin Lai

分类：计算机视觉 | 人工智能

2022-12-12

The unsupervised anomaly localization task faces the challenge of missing anomaly sample training, detecting multiple types of anomalies, and dealing with the proportion of the area of multiple anomalies. A separate teacher-student feature imitation network structure and a multi-scale processing strategy combining an image and feature pyramid are proposed to solve these problems. A network module importance search method based on gradient descent optimization is proposed to simplify the network structure. The experimental results show that the proposed algorithm performs better than the feature modeling anomaly localization method on the real industrial product detection dataset in the same period. The multi-scale strategy can effectively improve the effect compared with the benchmark method.

translated by 谷歌翻译